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(54) TiUe: INSTRUCTION PREDICTION BASED ON FILTERING 
(57) Abstract 

Instruction pred ictipn based upon 'iconfidence and priority 
levels: A filtering effect is achieved; by provW^ a plurality 
of predictors (302, 304. 306, 308) hiaving 1) a confidence level . 
satisfying a prcdetei mined ithreshold value and 2) the highest 
priority level among the plurality; of predictors (302. 304. 306, 
308). ;v defiauU criteria is provided\should noEpredictor satisfy 
this criteria. Efficient use.,of predictor memory is achieved 
through selective updating of predictors. 
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INSTRUCTION PREDICTION: BASED ON FILTERING' {; 

. .. BACKGROUND OF - THE INVENTION 
The present invention relates generally to the 
5 . field of \computer- instruction prediction and, in 

particular, to instruction prediction based on filtering. 

,r - ^ Branch^prediction, a particular type of. ^ 
instruction predictionv-hasr become critical to trie 
performance of modern .pipeline microprocessors. As - 

10 pipelines grow in length,, instruction fetch (performed in 

one stage of a pipeline) moves farther away from 
instruction- execution (performed in another stage of the , 
pipeline) . Conditional branches (also referred to as 
conditional jumps) are one of the iEew operations where 

15 instruction execution affects instruction fetch.. If 

instruction fetch must wait for execution of a 
conditional branch before proceeding, considerable 
performance is lost due to the number. of pipeline stages 
between the two. As a result, conditional branches are 

20 typically predicted in an instruction fetch unit as taken 

or not -taken with a. mechanism independent of. instruction 
execution. Based on this-predictionV subsequent 
instructions are speculatively fetched. 

However , branch prediction is often wrong. In 

25 ' many cases, therefore, speculative instructions 

predictively fetched must .be "killed" and instructions 
from the- correct, path subsequently fetched as . 
replacements. Thus, the misprediction rate of a branch 
predictor is a critical parameter for performance. 

3 0 (Another important parameter is. the cost of. a 

misprediction, which is usually-related to the number of 
pipeline stages between fetch and execution.) 
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Fig, 1 illustrates the general interface 

\ -Between a: conve branch predictor 102 and a 

conventional Vmicropr any other computer system 

"in which predictor 102 may reside (referred to herein as 
a" "host; processor" 103) . "Typically, branch predictor 102 
resides^'within a host processor. However, for ease of 
discussidn, Fig. 1 shows predictor 102 coupled to host 
pirocessor 103 , ' Standard control signals between 
'^Tedictor 102 and processor 103,. well known to those 
^h^^ ordinaLry skill in the art, are omitted for clarity 

of ^'discussions * ^ 

/ 5- - Throygh .the use of -a program counter (not 

' shown) , host ; processor 103 supplies a conditional branch- 
'^■'^rlstructio^^ or" portion' t heroes of "7(^ 

"BrknchPC" 104) , and the predictor responds with a 

; prediction (also ref erred to as^ a . "presdictibn value" ) 106 
and some state inf oarmation; i.el, StateOut 108. This 

r state information is associated with a particular 
BranchPC and includes information neicessary to update 
predictor 102 after an associated conditional branch 
instruction is executed. 

More specifically, ujion execution of the, 
associated conditional branch instruction (i.e., when the 
Tsubject condition becomes known) , processor . 103 generates 
an actual outcome value 110 (e.g., a single bit 
indicating whether the branch is taken or not-taken) and 
returns this to predictor 102 along with Statein 108' 
through a; feedback loop 105. Statein 108' is the same 
iiiformatioh provided as. StateOut 108 for the particular 
BrknchPG 104; this information has been maintained within 
-processor 103 until the associated conditional branch 
instruction has been executed and outcome value 110 is 
available. Predictor 102 will use Statein 108' for 
updating purposes if necessary. For example, Statein 
108' and StateOut 108 (i.e., state information) may 
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include an address for a memory (i .e. , table) within 
predictor i02 that is associated with the. subject 
conditional branch instruction, . and is used to store 'the 
associated outcome value. 110 within the memory. , An 
example of a branch predictor, disposed within a processor 
is the MIPS RIOOOO microprocessor created by Silicon 
Graphics/ Inc., of Mountain View/ California. 

Methods for branch prediction are evolving 
rapidly becausegthe penalty, for misprediction and 
performance requirements for processors are both 
increasing. Early branch prediction simply observed that 
branches usually: go one way or the other, and therefore 
predicted the current direction (i .e . , tkken/not- taken) 
of a conditional branch to be the same as its previous 
direction; so-called "lastrdirection prediction. " This 
method requires only one bit : of storage per branch. 

On a sample benchmark (i.e., the 126. gcc 
program of SPECint95 available from the Standard 
Performance Evaluation Corporation) simulating a 
predictor 'with a 4KB table (i.e., a memory disposed 
within the predictor for holding predictions associated; 
with particular conditional; branch instructions) , such 
last-direction prediction h^d a 15v 
per:'' branch. " ■ ' ' " * = ' 

A simple improvement to last -direction'; 
prediction iis based om.the recognition that btdhches used 
to' facilitate instruction Ipops typically operate in a 
predictable pattern. Such?-Jb ranches are typically taken 
many times in a row for repeated; execution of the loop. 
Upon reaching the last^ iteratipn of the loop, however, 
such branch is then not -taken only ^once. When the loop 
is re-executed, this cycle is repeated. Last-direction 
prediction mispredicts such branches twice per loop: 
once at the last iteration when the branch is 
subsequently not-taken, and again on the first branch of 
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^t,he next , loop, when the branch is predicted as not -taken 
but is in fa:ct taken. '^^ 

Such double misprediction can be prevented/ 
: however ^^J^^^ using, two bits to encode the , history for each 
branch. Thia may be carried out with a state machine 
that, does not change the predicted direction until ^ two 
branches, are consecutivisly encountered in the other 
direction. On the sample benchmark, this enhancement 
r rowered the simulated misprediction rate to 12.1%. "This 
predictor is sometimes called "bimodal" > in the 
literature-. - v ^■ 

Additional improvements: to branch prediction 
include the use of global and/or local "branch history" 
to pick up correlations between branches. Branch history 
is typically represented as a finite- length shift 
register, with one bit for each taken/not -taken outcome 
shifted into the register each time a branch is executed. 
Local .history uses a shift register per branch and 
exploits pattiems in the same to make predictions. For 
example, given the pattern 10103,010 (in- order of 
execution from left to right) it seems appropriate to 
predict that- the next branch will be taken (represented 
iiy ra Ibgic one) Global history, on the^ other hand, uses 
/ ^a "single; Shift register for all branches and isi thus a 
superset of local history. 

:f ^^^^ suggested for 

utilizing history in branch predictioriV:. /Two 
representative^ methods for local and, glbbal history are 
called "PAG" and "GSHARE, " respectively. These methods 
.are .further described in one or more of the following: 
Yeh, ^t al^, , Comparison of ; Dynamic Branch Predictors 
That Use Tv/o Levels of Branch History," The 20th Annual 
International Svmposium on rnmputer Architecture, pp. 
257-266, JEEE. Gbinputef Society Press (May 16-19, 1993); 

Yeh, et'alV, "Alteriiative Implementations of Two-Level 
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Adoptive Branch Predictions," The 19th Annual 
International Symposium on Computer Architecture, pp. 
124-134, Association for 'Computing Machinery (May 19-21, 
1992); and S. McFarling, "Combining Branch Predictors," 
WRL Technical Note TN-3 6-. Digital ' Equipment Corp, (1993) 
("McFarling"), eacH of which is hereby incorporated by 
reference in its entirety for all purposes. 

On the sample benchmark, PAG and GSHARE lowered 
the simulated misprediction rate to 10.3% and 8.6%, 
respectively. In general, global history appears to be 
better than local history because the history storage is 
only a^few bytes, leaving more storage for predictions. 

A further improvement to branch prediction is 
achieved by combining two different predictors into a . 
single branch prediction system, as described in 
McFarling, The combined-predictor system of McFarling 
runs two branch predictors in parallel (i.e., bimodal and 

GSHARE) , measures which one is better for a particular 
conditional branch, and chooses the prediction of that 
predictor. On the sample benchmark, a combined-predictor 
system using bimodal and GSHARE achieved a simulated 
mispredict rate of 7.5%, 

Another variation to branch prediction is 
suggested in E. Jacobsen, et al . , "Assigning Confidence 
to. Conditional Branch Prediction," Proceedings of the 
29th Annual IEEE/ACM International Symposium on 
nX C rQ^rchi t t wr g » IEEE computer Society Press, pp. 142- 
152 (December 2-4, 1996) ("Jacobsen"), which is hereby 
incorporated by reference in its entirely for all 
purposes . Jacobsen describes a method for determining a 
"confidence level" for a given branch prediction. 
Jacobsen suggests that confidence signals may be used, 
for example, to select a prediction in a system that uses 
more than one predictor. 
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- One suggested conf idence.T:l;evel. measu^ is 

■ . e a resetting counter, which increments on each 

correct prediction (but stops at its .maximum .value) , and: 
is reset to zero on a misprediction. ' (This resetting 
counter may be a saturating counter ;, i; e . , one that does 
V . n^ past - zero nor increment past its maximum^ 

value.)" Larger counter values indicate greater 
confidence in a prediction. Exemplary pseudocode for 
this confidence- level measure is provided in Table 1 
below. 



Table -1. 


Conf idencet . 




high confidence if count at 


conf 7 count = countMax 




maximum 


Update: 






if actual = prediction then 






. if count < countMax then. 




increment count if correct, 


r count 7 count + 1 




saturate at mciximum 


endif 






■ V .... else:. 






count 7 0 




reset count if incorrect 


endif . 







The foregoing discussion is . directed primarily 
to maintaining a prediction state or history per branch 
instruction. In practice, however, such information is 
"^iept in fixed size memories (i .e. , "tables") . ^ The 
'information is typically iiri^agggdi Shd so prediction data 
I for multfiple cphditionar^-b^ oftefn shar^ the same 

location in, the tables im'detectedi When this happens, it 
usually increases the misprediction rate. The more 
: /advanced methods more;v information per branch, and 

so> there is a tension between the reduction in the 
mispredict rate from the additional information and the 
increase in the mispredict rate due to increased sharing. 
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A combined predictor, as described in 
McFarling, that chooses between GSHARE and bimodal can 
take advantage of the fact that sometimes history, helps 
to predict a given branch, and sometimes history is not 
relevant and may actually be harmful. Such predictor 
operates by running both predictors in parallel and 
choosing the better one. Selection criteria for choosing 
an acceptable prediction may be a confidence level. In 
such a situation> however, both predictors (and the 
chooser) consume^costly table space, even when the 
prediction of one predictor or, the other is almost never 
used for certain branches . The extra table space 
consumed by the unused predictor increases false sharing 
(i.e., the usie of a prediction for one branch instruction 
by another) , and thus reduces accuracy - 

Moreover, selection criteria based solely on a 
confidence level may be inadequate when, for example, 
more than one predictor is- sufficiently confident. There 
is a need for distinguishing between multiple predictor 
alternatives that may be uniformly deemed sufficiently 
confident (and therefore acceptable) . 

Accordingly, it would be desirable to have a 
predictor system and method that efficiently uses table 
space for servicing instructions that utilize prediction 
information, such as conditional branches, to reduce 
false sharing, and thereby increase prediction accuracy. 
Further, it would be desirable to have a prediction 
system that distinguishes among a plurality of choices 
that are each deemed acceptable through a confidence 
level or other acceptance-testing mechanism. 
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SUMMARY OF. THE INVENTION 
The invention provides method and apparatus for 
generating; predictions that efficiently use table, space 
for servicing conditipnai instructiorisV Further, the 
invention provides a system that prioritizes and thereby 
distinguishes' predictions; *each of which may be deemed 
equaily la.cce.pt able to use through a. /confidence level or 
any other acceptance -testing mechanism. 

oln ^^; first '•.embodiment , a system is provided 
that generatesi/a predict^ipn: for a given situation. This 
system includes a plurality of predictors generatirig a 
plurality of prediction values for .the; given situation, 
means ■ for processing said plurality of prediction values 
to produce the prediction, and a feedback loop coupled to 
the piurality bflpredic tors for updating only a portion 
of the predictors based upon an actual outcome of the 
given situation. : . > 

In another embodiment, a method is provided 
that generates a p>rediction for 'a given instruction. 
This method includes the steps of providing, a plurality 
of predictbr^ f^^ information of the 

instruction and producxng a prediction value by at least 
.^ohe predictdr: of the plurality of predictors. Further, 
this' method also : includes piroce^ssing the prediction value 
to generate the p^ and updating only a portion 

bf^thei -predictors - with actual outcome information 
prbvide^i/f r6m;vexecut:ion of the given instruction. 

v^Inryet ahot^ embodiment, a predictor system 
is provided that generates a desired prediction for a 
given instruction. This system includes a plurality of 
predictors generating a plurality of predictions, each 
predictor being assigned a priority level and at least 
one predictor being operable to indicate acceptability of 
its -prediction- Coupled to the plurality of predictors 
is a selection circuit which selects the desired 
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* prediction from a desired predictor; In accordance with 
this system, the desired predictor is (1) .a first' 
.predictor when such predictor indicates acceptability of 
its prediction and has- a highest assigned' priority level 
5 among any other' predictor of the plural ity of - predictoris 

that also indicates acceptability of its respective- 
prediction; and -(12) a' second predictor when none of the . 
plurality^ of predictors indicate accepftability of . its 
prediction; this' second predictor having a" lowest 
10 assigned priority^ level. 

\^ ;^ Exist ihg host processors are easily modified to 

incorporate th;e predictor system of the present 
invention. Moreover, such predictor system abcommddates 
' further^ enhancements to thfei host processor such as trace 
15 caclies twhich may be controlled by conf idence_leyeils) at 

relatively low cost^ 

^' A further understanding of the nature and 
advatntages of the invention may be realized by reference 
to the remaining portions of the specification and 
20 T drawin^^: Like reference numbers in the drawings 

indicate identical or functionally similar elements. 

' BRIEF DESCRIPTION ^^^OF THE DRAWINGS 

■ ■ V , Fig. 1 illustrates the' general; interface 

betwe<en a conventional branch predictor, and a 

25 'conventional host processor. 

Fig. 2- is a block diagram^ illustrating the 
conceptual flow of conditional brjaiich instructions 
through: a branch predictor systeniruding' filtering in 
accordance with the principles of the invention. 

30 Fig. 3A is a block diagram of an embodiment of 

a branch predictor system using parallel -accessed 
predictors and filtering in accordance with the 
principles of the invention. 
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: : . , Fig. 3B is a flow chart, a '-method for 

yi ' generating, a desired pr^ a given branch 

.instruction in- accordance with the principles of the 
invention. . y ' ^ 

?: Fig.. 4 is a block diagram of a. preferred 

embodiment; of a branch predictor" system using filtering 
inJaccordance with the principles of the invention, 

:r Fig. 5 is a, block diagram of a global -history 

y "shift register; used^ in the;; predictor system of . Fig . 4 . 

\ Fig. 6 is a bldck diagram , of ; a hash xinit-used 
in the predictor system of Fig. 4. 

Fig. 7 is a block diagram of a first update 
circuit used in the predictor system of Fig. 4. 

Fig. 8 is "a block diagram of a second update 
circuit used. in the predictor system Fig. 4. 

Fig. 9 illustrates state output signals 
generated by the predictor system of Fig. 4 , 

Fig. 10 illustrates state input signals 
received by the predictor system of Fig. 4. 

Fig. 11 illustrates trace ^driven simulation 
results from a number of predictors and predictor 
systems. 

Fig. 12 is a simplified block diagram of a host 
processor ..that utilizes the predictor system of Fig. 4. 

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS 
; , The following embodiments are di^'^^^®^ 
systems^^:4:nd methods servicing conditional branch 
instructa^^^ However, the presient invehtion is 
applicable to any operation or situation where prediction 
information may be used. 



A. gystesH ^QQ 

Fig. 2 is a block diagram illustrating how 
conditional branch instructions conceptually flow through 
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'a branch 'predictor system 20pv.that uses filtering * in 
accordance with the principles of the present invention. 
Like predictor 102 of Fig.. .:1% system 200 interfaces with 
a conventional- host pro.cessor .(not shown),. In accordance 

,5 with conventional branch prediction operation, system 200 

supplies a prediction (also referred. .to as a "prediction 
value") and state inf ormation ;to - the host processor, to 
• eriable the processor, to predict a pending, conditional-^ 
branch' instruction. In response, .'^the processor supplies 

10 an actual outcome value ;( indicating whether a branch is 

taken, or not -taken) and state : inf ormation to enable the 
predictor system to update if necessary. 

• • v . As shown in Fig. 2, a number of individual 
constituent predictors 2 02, -204, 2 05 and 2 06 are serially 

15 disposed withinv system 200. Each cpnstituent predictor 

may be constructed from a /convent ional branch predictor 
modified to generate confidence levels in accordance with 
methodologies described herein. Accordingly, each 
predictor generates a branch prediction "P" and ■ 

20 confidence level "C" for . a received BranchPC.- Predictors 

202-206 are hie'rarchically arranged in a gradually 
decreasing "pribrity level" from left to right . The 
priority level assigned^-- to each-predictor defines the 
' relative -p^^^ a particular ^"predictor among 

25 all others in system 200, subjec^t to satisfying 

additional criteria (i.e.; confidence level) as described 
below. 

" In operation, the prediction for a given branch 

instruction will be provided by system 200 from a 

30 ^ \ predictor with the highest priority that has a' confidence 
level .satisfying a predetermined threshold value ("PTV"). 
A confidence level may "satisfy" PTV by, for example, (1) 
having' a value that is equal to PTV, (2) having a value 
that is greater than or equal to PTV or (3) having a 

35 value that satisfies any other test applied to a given 
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situation. Should a confidence level. "satisfy" PTV, the 
associated prediction is considered acceptable for use; 
although its' relative desirability (with respect to 
oiitput from other predictors) is dependent upon the 
5 priority level of the associated predictor. 

■ Should no confidence level satisfy PTV for the 
given branch; a prediction from the predictor of lowest 
priority shall be used by default. Accordingly, 
; ■ xeferrin^^^ 200 of Fig.- 2, the number of branch 

10 instructions available for prediction by a given 

predictor from left to right likely decreases in 
accordance with decreasing priority levels . Predictor 
202, haying the highest priority, will conceptually 
consider all (i .e. ; N) branches in- a given application 

15 and service those for which it is sufficiently confident; 

i;e.^ the confidence level C of the predictor for the 

branch instruction being considered, satisfies a PTV. 

However, predictor 204, having a lower 
priority, will conceptually consider only those branches 
20 not serviced by preceding predictor 202 (i.e., N-Nl) . 

Again, this predictor will only service those branch 
instructions: for which it is sufficiently confident. 
F^irther, predictor 206, having the lowest priority/ .will 
conceptually consider only those branches not serviced by 

2,5 anjr precediiig predictor. This predictor will servicej all 

branch instructions regardless of the corresponding 
confidence level. As such, the series of predictors 
202-205 "filters" branch instructions using confidence 
and priority levels to select a predictor of highest 

3 0 possible priority for any given instruction. Any 

residual is serviced by predictor 206. The use of 
predictors to selectively service branch instructions 
based on confidence and priority levels is referred to 
herein as "branch-prediction filtering. " 
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Referring again to Fig. 2, first predictor 202 
yields a prediction "P," and a prediction confidence level 
"Ci" for a given branch instruction. If Ci satisfies- a 
PTV (e.g., if Cj is greater than or equal to PTV) , 
predictor 202 is desired for supplying a prediction. 
Accordingly, P, is selected and forwarded to the host 
processor to predict the given branch, and the remaining 
downstream predictors are ignored. However, if Ci does 
not satisfy PTV (e.g., is jess than PTV), the next 
predictor 204 in the chain is. evaluated for selection and 
use. Conceptually, this serial process continues down 
the chain of predictors until either a sufficiently high 
confidence level is found, or the final predictor (i.e., 
predictor-" 206) is reached. If the final predictor is 
reached, this predictor becomes desired for supplying a 
prediction and the associated prediction "Pn" is selected 
for the given branch regardless of the confidence level 
"Cn" . 

Prediction methodology applied to system 200 of 
is summarized in the pseudocode of Table 2. 
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'Table 2: Prediction Methodology. 


'> ■ predl , conf 1 , state! 


7 


Predictorl (BranchPC) 


■ pred2, conf2, state2, 


7- 


Predict of2 (BranchPC) 


';predN,. confN, stateN 


7 


PredictorN (BranchPC) 


, .U. if ; cohf 1 then 






-^pred 7 predl 






elseif conf2 then 






pred, 7 pred2 






; eiBe.v;.yi:^,.;3" ■■ 

pr^ci' 7, predN 






.;e'ndM ■ • ^ ■ . . 







}:/'\-\\. Referring to the pseudocode in Table 2, 

predijgt^^^^ ( "conf#") , and 

state ;ipfbrmati variables. for each predictor- 

vstage are initially assigned values in parallel . 
Beginning with- confidence level Ci ( "conf I'M ; the 
confidence level of each predictor is evaluated to 
determine whether it satisfies (i.e., is greater than or 
equal to) the PTV. If the test is successful, the 
associated predictor is chosen and the predictor- specific 
prediction value ("pred#") is output as a system- level 
output; t the next lower 

predictor in priority is evaluated (pred#+l) . 

;> Ultimately, if no confidence level satisfies the PTV, 
-then final predictor 206 ("predN") is selected by 
default. . " ■■ . 

In accoirdance with the foregoing discussion, 
confidence level On of last predictor 206 need not be 
measured since the prediction value Pn generated by this 
unit is utilized by default if no other predictor 
satisfies PTV. However, it may still be desirable to 
determine Cn for purposes other than selecting a 
predictor. For example, a supporting host computer may 
be configured to allow for the fetching of one or two 
instruction paths of a given conditional branch 
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instruction. . A confidence- level criteria may be used to 

trigger the f etching of the,, two-paths , as described in 
Jacobsen. ~ . 

As discussed above, system 200 provides for a 
hierarchical filtering operation where a given - ' 
conditional branch instruction- passes from, one predictor 
to the next (moving downward in priority} until a 
predictor with a sufficiently high ^corlf idence level: (Cx)^ 
is encountered or last predictor .206 is reached. ^ Such 
filtering of branch instructions provides a mechanism for 
control over prediction operations. For example, a 
predictor type {e.g., last-direction, GSHARE, etc.) 
likely to^have a sufficiently high, confidence level to 
service a large volume of branch instructions or a - 
particular type/class of instructions may be 
strategically placed upstream in system 200, Such 
positioning may help prevent the passing of certain 
branch instructions to predictors downstream where such 
instructions might disrupt or introduce algorithmically 
undesirable branches for a particular prediction scheme 
in the downstream stages. 

Moreover, system 200 ;is highly tnpdula^ and 
therefore, easily expandable with additional stages . This 
modularity may be utilized to include, -f or example, 
additional predictors that service specif ic= conditional 
branch instructions. Such specialty predictors may be 
assigned lower priority assuming the specific conditional 
brari6h instructions targeted by these predictors are 
unlikely to be serviced by^ predictors^ placed upstream. 

Referring again to Fig. 1,. after the actual 
outcome value 110 of a conditional brarich instruction is 
determined, this information (and supporting state 
information 108) is returned to predictor 102 through 
feedback loop 105 for any necessary updating (in 
accordance with the prediction method employed by this 
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pfe,di,ctory . ^^^This updating^^'operati^ required by 

the^predic?tor^^^ 200 of Fig; 2.:^^ 

1: Referring fco^F.ig. 2, if predictor ,202 (i;i^e. , 
^:; 4, the predictor having the^ highest selected to 
5 carry out a prediction operation, an actual outcome? yalue 

' .and^ state ihfdrmation (provided by a host processor) is\. 
'^cortcepfeually forwarded only to predictor 202 for updating 
purposea^ However, if a predictor of lower priority 
(eugl ^ 204 Qr> 206)^ is selected for prediction operations, 
10 then any predictbr residing upstream :( i.e., having 

higher priority) of. this selected predictor as well as 
■the: selected predictor itself are conceptually provided 
witH "an actual outcome value and associated state 
informa£ion for updating (if necessary) . Each predictor 
15 r. - will receiive the same actual outcome value (i.e., value. 

110 of Fig. 1) .v; However, individualized state 
information associated with each predictor will be 
returned to the predictor from which it originated. As 
discussed Below, this state information may include 
20 addresses for table locations disposed within each 

predictor that the actual outcome value is to be placed. 
■ Update methodology applied to branch predictor 

system 2 00 of Fig. 2 is illustrated in. the pseudocode of 
■ Tafeie'^'.'" , ■ -'^ ^. 



wo 99/14667 



17- - 



PCTAJS98/19674 



, . Table. 3 : Update Methodology. 

Updatel {actual , statel) ' 
if not confl then 

Update2 (actual , state2) 

if not conf2 then 

Updates (actual,. state3) 

if . not confN-i .then 

UpdateN (actual, stateN) 
endif 

endif 
endif 



Referring to Table 3, update methodology begins 
with the update of predictor 202 (i.e., "Updatel") via 
subroutine Updatel. Actual outcome value ("actual") and . 
state information associated with predictor 202 
("statel") are returned to predictor 2 02 for updating if 
necessary. If confidence level Ci ("confl") did not 
satisfy the PTV for system 200, the next predictor 2 04 is 
also updated via subroutine Update2 . This process 
continues until a predictor is reached whose confidence 
level satisfied the PTV or upon reaching final predictor 
206. " Significantly ,^ as noted above, predictors assigned 
lower priority than the selected predictor are not 
updated during this process. 

A selected predictor (i.e., a predictor whose 
prediction is selected to carry out a branch prediction 
operation) and only those predictors residing upstream of 
the selected predictor (i.e., those predictors having a 
, higher priority level) utilize update information in 
response to execution of a given conditional branch 
instruction. Accordingly, table space in predictors 
located downstream of a selected predictor is not wasted 
on actual outcome values generated by branch instructions 
serviced at higher priority predictors. Therefore, 
predictors of lower priority are disposed to be more 



■application specific to the particular branch 
instructions they service "since update^, information comes 
predominantly from these .instructipnis . ' In.^other words , . 
by reducing tha.nun±)er; of branch instructions updating 
do\ynstream predictors, ..there is les^. data in the tables 
of these predictors and therefore less l^ikelihood pf;^ 
/false" sharing. - 

System 200 graphically illustrates branch 
prediction filtering through serial operation. Although 
this provides a convenient model to describe the 
conceptual flow anid processing of conditional branch 
instructions, it is preferred that predictors be accessed 
in parallel. Figs. 3 and 4 illustrate branch predictor, 
systems: having such parallel -accessed predictors, 

B. System 300 

Fig, 3A is a block diagram of an embodiment of 
a branch predictor, system 300 using parallel -accessed 
predictors and filtering in accordance with the 
principles of the present invention. The data input s. 
frbm a conventional host processor 103 (i.e., BranchPC 

104, . StateIn 316* and;, actual outcome va:iue 110) and 
outputs to host ptocessor 103 (i , e'; , . Statebut 316 and 

PredictOut 313) bonvey the same infb^ the 
individual constituent predictors would otherwise require 
and generate, respectively, when interfacing with a host 
I>rdcessdr. Standard signals from processor 103, 

well. lcnb\^ ordinairy skill in the art , are 

not/ shown. System 300 is preferably disposeid within host 
processor 103, hxit for ease of discussion Fig, 3A shows 
system 300 coupled to processor 103 , 

Referring to Fig. 3A, a program counter 317 is 
coupled in parallel to several predictors 302-308, 
Similarly, input lines 350, 352 from processor 103 
conveying > S tat eln 316 * and actual outcome value 110, 
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respectively, are coupled in parallel to predictors- 
302-308. Lines 350, 352 make up a feedback loop 354. 
Each predictor generates state information ("S") and a ; 
.prediction ("P", also referred to as a "prediction 
value") from any well-known branch prediction method 
(e.g. , . last-direction> bimodal, PAG, GSHARE, etc.) . 

Additionally, each predictor except for final 
predictpr, 3 08 generates a confidence level indicator 
("CI") -indicating whether the confidence level ("C") of 
that particular. predictor satisfies the PTV for system 
300 ("system 300, PTV") and therefore the associated 
prediction is acceptable to use. Computation of 
Confidence. level may be carried out; for example, in 
accordance with, the pseudo-code described above in 
Table 1 or below in Table 5. Determination of whether a 
confidence level satisfies a PTV (i.e., generation of a 
CI within each predictor) may be carried out with 
discrete logic (e.g., like gate 480 of Fig. 4), a 
conventional comparatpr or any like device as would be 
apparent to one having ordinary skill in the art. (As 
noted above, although a prediction may be considered 
acceptable for use by its confidence; level, its relative 
desirability -- with respect to output from other 
predictors -- is dependent upon the priority leVel -of "the 
associ-^ated predictor.) ; 

If the confidence level C for a particular 
predictor is greater than or equal to system 300 PTV, 
then the associated CI. is output as a logic high- or one 
indicating acceptability of its associated prediction. 
Alternatively, if such confidence level C is less than 
system 300 PTV, the associated CI is output as a logic 
low or zero indicating unacceptability of its prediction. 
As an altelmative embodiment, each predictor may be 
assigned an individual predetermined threshold value 
which must be satisfied to output a logic high CI. As an 
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additional alternative, each processor may employ a 
different^ method of computing at confidence level. As a 
further alternative, criteria other than a confidence 
level may be used to indicate acceptability of a 
particular prediction. 

Predictions Pi-Pn from, predictors 
302-308, respectively, are forwarded to data inputS: of 
multiplexer' 3 12'. Confidence level indicators Cli-CIn.i ^ 
from; predictors 302-366, respectively, are forwarded to 
data inputs Di-D^.j of priority .encoder .3l0 . The . output of 
encoder 310 (Qi,Qo) is forwarded to the selection input of 
multiplexer 312. Through the processing of multiplexer 
112 , and encoder 310 (as described below) a; collection of 
predictions (i.e. , Pi-Pn) are, reduced to a single value 
that is output to processor 103 . In brief, multiplexer 
312 and encoder 310 select the most desirable prediction 
based on confidence-level- and priority criteria. 

Referring again to Fig. 3, state information Si- 
Sn from predictors 302-308 is forwarded to state output 
device 315, which simply outputs Si-Sn in concatenated 
form to host microprocessor 103 as StateOut 316. Device 
315 maY be implemented with conventional logic, as would 
be apparent to one having: ordinary skill in the art , 
This information is ■ultimately returned to system 300 as 
Statein .316 ' f or updating predictors 302-308, 
r:espectively . 

More, specif ically, state information Si-S^^ 
which is associated with a particular prediction and 
conditional branch instruction, is temporarily maintained 
in host processor 103 after the associated prediction is 
generated and ultimately returned to predictor system 3 00 
at the time that an actual outcome value 110 is generated 
by the host processor for purposes of updating the 
predictor stages. State information may, for example, 
identify a location within a table of a predictor stage 
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15 



20 



25 



- that is to receive actual ^outcome value 110 as an. update . 
As shown' in Fig. 1 /the use' of state information for 
performing ^updates to brancli- predictors is a conventional 
operation well knoWn" to those vhaving ordinary skill in 
the-art r . System ,300^ of Fig% 3A simply concatenates" this 
information to facilitate more than one predictor; -.; 
i.e. , forming a state information vector. Each predictor 
302-30,8 extracts inf ormationvwith . w^^ it is associated 
::when this vector . is^_. returned 300 as Statelrr . 

3-i6',^ ':■::} '^^yyf-J -'■■'-.I'l .... -/V.;^-/--"''! " 

; . v J Priority encoder 3p.O . receives corifidence^^ 

indicators eii-CIn.i from, predii^^^^^ and processes 

the signals to effect a filterdng result.. Priority 
■ encoders" are "Well - knd^^^^ Ehcoder 310 may be ^ 

designed to function as any conventional encoder such as 
the MC10H165 available from Motorola, Inc, A truth table 
for the operation of encoder 310 is provided Table 4 
below, where. "L" is a logic Iqw or zero (0), "H" is logic 
high or one (1) and "X" is a Don't Care. The "L" entries 
in Table 2 represent logic low CI s which signify 
confidence levels that' do not satisfy (e.g./ are less 
than) system 300 PTV,. Conversely, the "H" entries in 
Table 4 represent logic high GIswhiqh signify confidence 
levels that satisfy (e .gf, %ie greater than, or equal to) 
system 3 <)0 PTV. 
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Table 4: Truth Table For Encoder 310 
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As Table 4 illustrates,; if .no predictor 302-306 
has! a; conf ideiice; level C thatv^sStisf ies system 300 PTV 
. (i^e . , .D;;=Dn=Pn_'j;=L^ , then predictor - 308 is desired as a 
prediction soiJirce. Accordingly, encoder 310 shall select 
prediction- P„ of predictor. 308 by forwarding an address of 
0 (i.e. , Qi=0 ,Qo=0)- to multiplexer 312 and -thereby output 

Pn as Predic^Qut 313 . ' (PredictOut V313 is forwarded to a 
fetch uriit (not shown) within host processor 103 . ) " 
Alternatively, if, any; of predictors 302-306 has a 
corif iderice leveil that satisfies ^stem 300 PTV, then one 
of those predictors shall be selected for providing 
PredictOut 313.. 

Referring to Table 4 and Fig. 3A, predictor 3 02_^ 
Is assigned highest by encoder 310. If the 

confidence level of predictor 302 (Gi) satisfies system 
300 PTV [i\e.,f; C\ is greater thkh of equal to system 300 

PTV) , then predictor 302 is desired as a prediction 
source, regardless of the confidence levels of predictors 
304 and 306. Accordingly, predictor 302 outputs a logic 
high Ci, which is received at input of encoder 310. As 
a result encoder 310 shall select prediction P^ of stage 
302 by forwarding an address of decimal 1 (i.e., 

Qi=OiQo=l) to multiplexer 312 and thereby output Pi as 
PredictQut 313. Alternatively, if confidence level 
indicator Cli" is a logic low (i.e., Ci is less than system 
300 PTV) , and confidence level indicator CI2 is a logic 
high (i.e., the confidence level of predictor 304, C2, is 
greater than or equal to system 3 00 PTV) , then predictor 
304 is desired regardless of the value of confidence 
level indicator CIn-i . Accordingly, encoder 310 shall 
select prediction P2 by forwarding an address of decimal 2 
(i.e., Qi=l,Qo=0) to multiplexer 312 and thereby output P2 

as PredictOut 313. 
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The foregoing' description applies equally to 
selecting predictor 306. Confidence level indicators 
necessary to select the output of predictor 306 for a 
particular branch instruction is shown in row. 4 of 
Table 4. Should system 3 00 include a greater number of 
non-final predictors (i.e., more than 302-306), then 
additional confidence level indicators "CI" would be 
providjed to encoder 310. The -values of these additional 
confidence level indicators follow the patterns set out 
in' Table 4 to s^elect predictors identified in this table; 
i.e. , these additional values (between CI. and Cl^.^) . would 
be L, X, X and L to select predictors 3 08, 302, '304 and 
3 06;, respectively. 

As described above, selection of a branch 
prediction (i.e., Pi -P J -within system 300 results in the 
output of that prediction (i.e., PredictOut 313) and 
associated state information (i.e., StateOut 316) to host 
processor 103. Upon execution of the associated 
conditional branch instruction by execution unit 319 in 
processor 103, the actual outcome value 110 (e.g., a 
single bit indicating whether the branch is taken or 
not -taken) resulting from such execution 'is returned to 
predictors 302-308 accothpanied by the previously output 
state information -'(StateOut 316) referred to now as 
Statein 316'- This state /information is necessary to 
carry out update pperatioris :as described below 

" Like system 2 00/ update operations are 
performed only on the predictor selected to provide a 
prediction ^and any predictor of higher priority. 
Predictors of lower priority are not updated. Referring 
to Fig. 3A, actual outcome value 110 and associated state 
information for each predictor 302-3 08 are provided in 
parallel via feedback loop 354 to predictors 302-308. As 
described above in connection with system 200, 
eliminating the need to update predictors having lower 
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priority reduces.the,.load/on the (sinqe less 

data is-; being retained) and, therefore, reduces the , 
likelihood of false sharing. u ; 

. - Referring to Fig. 3B, flow chart 370 describeis 

1^^:: a met hod for generating a desired prediction for a given 

: V; bJ^^nch instruction according ;,t9 the principles of the 
pres|ent invention. In block 371, a plurality of ; 
predictors , are provided. These may be non-final 
piredictors 302-306 and a final predictor 3 08 , as shown in 

; Neixt, these predictors^ ate assigned priority 

levels in accordance with block 372. Priority assignment 
may be , carried out through digital hardware, such as by 
coupling each predictor to a specific input of an encoder 
as shown:; in Fig . 3A7 Aiiy other method fot" assigning 
relational identifiers to components jmay also be used 
(e.g., software or firmware control) - 

Pursuant to block 373, an address (i.e., a PC 
value), for a given conditional branch instruction is 
provided to each predictor to initiate processing. In 
block 374, branch-prediction processing ensues; i.e., the 
generation of branch predictions and confidence levels at 
each predictor except, perhaps, the final predictor. As 
noted above, ; the generatipn of a confidence level at the 
final predictor is unnecessary to carry out the filtering 
operation of -the present invention. 

iri; decisional block 376 , a determination is 
made a^ ^to vtoether any confidence level generated in 
block 374 satisfies the associated PTV and is therefore 
acceptable to use ,lsub j ect to^^priori ty hierairchy ) . 
Satiisfaction of a PTV is a design par unique to a 

particular system. A PTV may be satisfied, for example, 
if a confidence level is greater than or equal to the 
value of the PTV. Of course, any other comparative test 
may be applied. In an alterative embodiment, each 
predictor may have an individual PTV. If one or more 
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predictors provide confidence, levels that^ satisfy . the 
associated PTV, the predictor with the highest priority 
level is selected as the desired predictor with the 
desired prediction pursuant to block 378. 

Alternatively, if no predictor has a confidence 
.level that satisfies the PTV, then the final predictor 
(e.g., predictor 308 of Fig. 3A) is selected by default 
as the desired predictor having the desired prediction, 
in -accordance with block 380. \ 

- After execution of the predicted branch 
instruction, -the predictors are selectively updated 
pursuant to block 382. Specifically, the desired 
predictor providing the^ desired prediction and every 
predictor having a higher priority than the ' desired; 
predictor is subject to updating where necessary, 
eonversely, predictors with priorities lower than the 
desired predictor are not updated with new prediction 
values . 

C . gystem jQQ 

, Fig. 4 illustrates a preferred embodiment of 
the present invention in the form o'f a branch predictor 
' systetn '400 . Like systems 200 and 300, predictor system 
AOO preferably resides within a conventional host- 
processor which provides both data and control signals to 
the -system. A simplified block diagram of a host 
processor 1200 supporting system 400 is shown in Fig. 12 
arid described below. As in. the previous discussions, 
"high" and "low" signals in system 400 are logic ones and 
zeros, respectively - 

Referring again to Fig. 4, system 400 includes 
a first constituent predictor 402 and .final constituent 
predictor 452 employing last -direction prediction and 
GSHARE prediction, respectively. Predictor 402 generates 
a confidence level "StatelOut .C" , a prediction (also 
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ref errM. to ;as a prediction value) "StatelOut.P" and a 
; Uopki^ address^^ "SfcateiOut .AVfor. the confidence 

level and prediction. Similarly, predictor 452 generates 
a -predictipn : "State20ut ,P" and a lookup table address 
' "State20ut ,A" ,for its prediction., _ . 

:As shovm in Fig. 4, system 400. includes AND 
gate 480 coupled to stage 402. (As an alternative 
embodiment , gate- 480 may be disposed within predictor 
402.) This gate received confidence level "StatelOut ,C" 
and- outputs, a control' signal to multi 482 ; Gate 

480' and multiplexer 482 collectiv^^^^ function as a 
' selection circuit (e.g. , like, encoder 310 and multiplexer 
312 of Fig, 3A) . . Additionally, gate .480 functions as a 
""PTV tester," generating a high signal when confidence 
level "StatelOut .C" satisfies the PTV for system 4 00 
(i.e., a decimal 7 in this embodiment) indicating 
acceptability of the associated prediction value, and a 
low signal otherwise. (As noted above, although a 
prediction may be considered acceptable for use, its 
relative desirability (with respect to output from other 
predictors) is dependent upon the priority level of the 
associated predictor.) The; data inputs to, multiplexer ' 
482 are prediction "StatelOut .P" from predictor 402 and 
prediction "State20utVP" from predictor -452. Based upon 
the control signal generated by gate 480, a prediction 
value from one of these two predictors is selected as the 
system- level prediction "PredictOut" for a given branch 
instruct ion . ' 

1. pre4i,<;tQr gt?^qe ^02 

Predictor 402 includes lookup table 404 (e.g., 
a 4096 X 4 RAM) whose data input is coupled to update 
circuit 406, address input ("A") is coupled to 
multiplexer 412 and write enable input ("WE") is coupled 
to AND gate 414. Circuit 406 and AND gate 414 are 
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further coupled to' the output of • exclusive-OR gate 410. 
. In addition^ multiplexer 412 . receives input;- from a hash 
unit 408- for prediction^^operation, as described below. 

Inputs to predictor 402 include state inputs 
( ••Statelin.C" to.update circuit 406, "Statelln. P" to ' ' 
exclusive-OR gate 410 , and ."Stately to multiplexer 

-,412), actual, outcome value ,1 gate 410 and circuit 

.406, program counter value ("PC value") to hash unit 4 08 
and a branch execution signal v."ExeBr" to AND gate 414. 

^Input PC value is used.s to carry out, a current branch 
prediction operation. The remaining inputs identified^ 
above are used for updating purposes. 

During branch prediction operation of a given 
branbh instruction, predictor 402, inputs /a 62-bit PC 
value ^which' is^ reduced -to a 12-bit table address 
"BranchPCl" through haish unit 408. Unit 408 performs a 
simple masking function that allows bits [14:3] of PC 
value [63:2] to pass to address input A of table 4 04 
through -multiplexer 412. Table 4p 4 functions as a last- 
direction predictor. Each entry of this table includes a 
1-bit prediction and 3-bit confidence level which are. 
associated with one or more branch irxstructions through 
"BranchPCl". More specifically, each conditional branch 
instruction Is, tssociated with a unique PC value. Those 

' branch instructions having thei same subset of bits malcing 
up "BranchPCl" will access the same location in Table 
4,04. During branch, prediction, the entry in Table 4 04 
accessed by. VBr^nchJJCl" ; is output from predictor 4 02 and 
processed as described below. 

During update operations for a previously- 
predicted branch instruction, .predictor 402 receives 
input from two primary sources: the predictor itself and 
the host processor. Input signals from predictor 402 
itself include a confidence level "Statelin.C", stage- 
specific prediction "Statelln.P", and a corresponding 
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: table address' »»^tatelln. A" . ' (This state, information was 
V -originally ^:output during the prediction 

operation for tSe* previously-predicted, instruction 
as "StatelOut.C" , "StatelOut and "StatelOut . A" , 
respectively.) These values have been maintained by -the 
hbst processor while the previously-predicted branch 
. rnstruetipn/wais; processed and^;are returned to stage 402 
several clock cycles later^to coincide with the receipt 
of actual outcome, value 110 generated by the host 
processor (upon Execution previously-predicted 
branch instruction) for updating purposes. ; 

in addition to value 110 ^ host processor 
provides predictor 402 with signal 'lExeBr", from an 
- execution unit (not^ shown) indicating the successful 
execution of the previously-predicted branch instruction. 
In both stages ,402 and 452/ signal ExeBr functions as a 
"valid" signal for state information. 

Referring to Fig. 4, prediction "Statelln.P" 
and actual outcbm4 value 110 are exclusively-ORed by gate 
410 to produce a mispredict signal "MisPRl". When this 
signal is high, the inputs to gate 410 are different 
, indicating ;Cthe original prediction of predictor 402 for 
:t:he pr<eviously-pre;dicted branch instruction was 
incorre^ct. Signals "Mi sPrl" and "ExeBr" are forwarded to 
AND gate ; 414 . if both signals are high/ Table 404 is 
enabled via gate 414 to update an entry. Moreover, a 
high input from gate ^^414^^^ ^^^i^^^^^^^ used to allow address 

" Statellii. A" to pass through multiplexer 412 ( i.e., the 

corresponding table address of the previously-predicted 
branch, inist met ion) ■ to addr^^ input A of Table 404. The 
update to Table: 404 for the previously-predicted branch 
instruction is provided by update circuit 406. 

Referring to Figs. 4 and 7, update circuit 406 
inputs confidence level "Statelln. C" , "MisPrl" and actual 
outcome 110 and outputs a 4rbit signal that contains an 



wo 99/14667 



- 29 - 



PCT/US98/I9674 



updated 3 -bit confidence level and a single bit 
representing actual outcome 110. As illustrated in Fig. 
7, outcome value 110 simply /passes through update circuit 
406. However, the updated 3 -bit confidence level is a 
product of update -C circuit 702. The functionality of 
circuit 702 is illustrated In the pseudocode of Table 5 
and : Truth Table of Table.^6, which define the operation of 
an asymmetric saturating cbunter. 



Table 5. 

Confidence: 

conf 7 count = countMax 

Update; 

if actual = prediction then 
if count < countMax then 

count 7 count + 1 
endif 

else 

if count > count Decrement then 
count 7 count - countDecrement 
else 

count 7 0 
endif 

endif 



Referring to the pseudocode in Table 5, a 
predetermined threshold value ("PTV") is defined as 
CountMax which, in this embodiment, is binary 111 (i.e., 
decimal 7). Further, the variable "countDecrement" in. 
this embodiment is a decimal 4 , 

As the code in table 5 illustrates, if actual 
outcome 110 ("actual") provided by the host processor 
matches prediction "Statelln.P" ("prediction"; originally 
provided by Table 404 as "StatelOut • P" ) / and if the 
current confidence level "Statelln.C" ("count") is less 
than 7, then "Statelln.C" is incremented by 1. Further, 
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N if there is a/ matcK but confidence level "Statelln.C" 
equals. 7, then ."Statelln.C" remains' unchanged. 

. ^ However, if there is no .match between actuai 
outcome. 110 arid prediction "Statelln.P, " and the 
5 ^ confidenGe; lever .^^^^ is greater than the 

variable ^ countDecrement , then confidence level 
"Statielln.C^' is decremented by4 countDecrement . Further , 
if there is no match and confidence level "Statelln.C" is 
less than or equal to countDecrement , then the confidence 
10 level is returned to zero for that entry in Table 404. 

Irnplemie^itation o?f this logic in the form of a Truth Table 
is provided in Table 6 . Any conventional circuitry or 
device (such as a state machine) may be used to carry out 
the logic defined by Table 6." ' ^ 
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2. Predictor Stacre 452 

Referring agaia to Fig . 4, predictor 452 
functions like GSHARE predictor (as described in 

. ;MGFarling) with a one-bit saturating counter and 
fourteen-bit global -hi'story register. Specifically, -- 
predictor, 452 includes a lookup table:;:454 (e.g., a 16384 
X 1 RAM) whose data input is coupled to update circuit 
■ 4 56, address input ("A") is coupled to multiplexer 462, 
and write enable input ( " WE" ) is coupled to AND gate 464 . 

^ Multiplexer 462 receivers iriput, from a hash unit 458; 

/i which, /in turn, receives input "^tate20ut .GHIST" f rom a 
global-'M In addition, AND gatei 464 

receives^ ihjput from NAND gate 468 and exclusive -OR gate - 
470, .as described below. 

' inputs to predictor 452 include state inputs 

("State2In.P" to circuit 456> "State2In.A" to multiplexer 
462\ ''Statelln.-C" to NAND gate 468, "State2In.P" to 
exclusive-OR gate 470, "State2in .GHIST" to register 460 
and "PredictOut" to register 460 (via latch 506 as shown 
in Fig. 5))., actual outcome value 110 to gate 470, PC 
value -'to :hash unit 458 and the branch, execution signal 
"ExeBr" to ;iAND -gate 464: " Inputs TPC varlUe and 
, "Pfedictput" ;ar^ used to carry out a current branch 
pre'diction operat^^^ The remaining inputs identified 
above are "used for updating purposes, 

Duririg brainch' prediction operation of a given 
branch instruction;* predictor 452 :.iripu a 62-bit PC 
value which is reduced to a. 14-bit table address by hash 
unit -458. Referring to Fig. 6, unit 458 performs a 
simple masking function in a mask circuit 602 that allows 
bits [16:3] of PC value [63:2] to pass to an exclusive-OR 
gate 604. The second input to gate 604 is global history 
word "State20ut .GHIST" . This word is generated from the 
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parallel output ("PO") of global histoiry register 460/ as 
shown in Fig. 5. - " ! 

Returning to Fig. 4, predictor 4 52 operates in 
accordancei with a GSHARE predictor having a ^one-bit 
saturation counter and fourteen-bit register . Upon 
./ fetching a branch instruction, hash unit 458 generates an 
" address "S^t:ate20ut :A»* for lookup . table 4 54 based upon the 
logical combination (i.^., exclusive-OR) of global 
hi story ^ word .»» St [lv6:3] of PC ^ , 

value (i;^e^: / " to this >address, 

table 454 outfputs a single-bit pr;e^ictipn V;State20ut . F" 
which, as: shown in Fig. 4, is, forwarded to :m^^ 
482. Signal "St;ate2put . P'V repi^^ 
. prediction. Tlie.' prediction selected by. multiplexer^ 4 ^ 
.(ypredictOut") is" -returned to shift register 460 via- 
latch 506' (Fig.. 5) and becomes part; of glbbal his^^^ 
word "State2Cnit.,pHIST" tliroughy shift -in input "SI" upon 
receipt of a *VFetchBr" siignal (indicating the subject 
branch instruction has been fetched and decoded) from^the 
hdst processor. ; ^ 

During update operations for a previously- 
predicted branch instruction, predictor 452 receives 
input:: from four- primary sources : predictors 452 , 402 , 
the host processor and sysit^m^ 4 00. Ir^ signals from 
predictor 452 include stage-specific prediction 
"State2:in.PV" a cbrreWpohding^ tabl^ address "State2iIn.A" 
dnd global history .word r^^^^^ Input signal 

from system 400 includes; Pred^^ state 
information was originally output by predictor ;^52 and 
system 400 during the prediction operation for the 
previously-predicted branch instruction as "State20ut . P" , 
"State20ut.A", "State20ut -GHIST" and "PredictOut" , 
respectively.) Like predictor 402, these values have 
been maintained by the host processor while the 
previously-predicted branch instruction was processed and 
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are returned to predictor 452 several clock cycles later 
. to coincide with the receipt of actual outcome value 110 
-generated by the host processor {upon execution of the-, 
previously-predicted branch instruction) for updating 
. purp^oses . : . ^ ; - ' 

In addition to value 110;\ host processor 
/provides predictor 4 52 -withv signal VExeBr" f rom ain 
;_fxecuti6n unit (not shown) indicating the successful 
i;execution of the . previously-predicted branch instruction. 
'Finally, -predictor 4 52 receives /confidehce level. 
..'"Statelln.C" from predictor 4 02: to negate update 
operations for this predictor if a higher-priority 
predictor (i.e., predictor 402) was selected for 
predictibn of .thd- subject branch instruction. 

In accordance* with >the logic shown in 'Fig. 4, 
should predictor 452 mispredittv during a prediction 
/ operation in which the output -of predictor 402 is used 
(i.e. , . "Statelln.C" = 7 which satisfies PTV in this, 
embodiment), then the output of gate 468 is low forcing 
the output of gate 464 to be low. in which -case, the 
:write enable/input of table 454 is not enabled and no 
updatis of predictor 452 will occur.: However, should 
predictor 452 Mispredict during" a prediction operation in 
which thevoutput of predictor 402 is not used (i.e., 
"Statelln.C" 7 which does not sa:tisfy PTV in this 
embodiment) , then the output of gate 468 is high allowing 
the;^ output^ of '^gate: 4 64 < to ga high . In which case , the 
write enable input of table^ 454 may be eihabled (dependent 
upon the* state of other signals as described below) and 
an update of predictor 452 may occur. 

Referring again to the logic shown in Fig. 4, 
if prediction "State2In.P" does not equal actual outcome 
110, the output of exclusive OR gate 470 will be high. 
After the subject branch instruction- has been executed, 
host processor will output signal' "ExeBr" high as well. 
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FinalIy/i"suiG€t: level "Statelln.C" of predictor 

402 does not' equal 7 (in this example) the output . of 
gate 468 will also be high thereby ;fprcing the output 
gate 'v4^4 vhigh and enabling; the write operation of ' table 
^454. Table address "State2In.A" associated with the^ 
previously rpredicted branch instruction undergoing update 
processing is applied to the address .input of Table 454 
through multiplexer 462 . Finally,' the -^^^^^ (arid, in 

this example, incorrect) prediction '"State2In. P" 
generated :by predictor 452 is inverted circuit 
456 (Fig;"^ a) and forwarded to datav Inpiit ;0I of Table 454 
. to update the associated entry. . - 

As shown in Fig. 8, circuit 456 consists of an 
inverter which; serves to ^correct 'an erroneous prediction 
previoiisly generated by^^precy.ctpr 452 . This corr^ected 
value is input to table 454 during update operations 
provided predictor 402 was riot originally selected for 
providing the prediction, as described above. As would 
be understood by one having ordinary skill in the art, 
the usf of an 7 inverter as circuit 456 is optional. 
Alternatively, circuit 456 maybe eliminated in its 
eintrrety and in^ut DI of table 454 may sinply receive 
actua;i oixtcomev value 110 geriefatied by the liost processor. 
This flexibility is possible ^since only a single bit is 
processed by More complex processing is 

^ .t^ multiple bits are updated/ as required by 

predictor 4,02 -( i .ev^^^ leyel and 

brie-bit prediction) . 

In tlie: event of a mispredLiction, the global 
history word contained in register 460 (Fig. 5) will be 
inaccurate for that and any subsequent prediction. 
Accordingly, word "State 2 Out .GHIST^'^ output to the host 
processor during prediction of the subject branch 
instructiori is returned to register 460 for input via 
parallel' input PI as "State2In.GHIST" . Referring to 
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Fig. 5, this updating .'pperatibri* is controlled by gates; . 

;502 and 504. As the circuit :^ih Fig. 5 illustrates, . 

-'should actual outcome IlO-riot ;equal the system-level 
prediction "Predictln" ; exclusive OR ;gate .504 will output" 
a logic= high; Gpncurreritly;, . host processor will output 

^signal : "ExeB^r" to indicate.:. the subject' branch instruction 

. was .executed? -in accordance with the .foregoing 
discussion/ signal ExeBr functions as a "valid" signal 

-for . state information. UponSreceipt of 1 these signals, 
gate '502 wiir output a logic high .enabling the , loading 
operation of register 460. . " 

3. System- Level Operation 

Referring again to Fig. 4,' a' system- level 
description of operatibhs^" will now be provided.''^ During 
branch prediction .operation, one-bit predictions 
"StatelOut .P" and "State20ut . P" f rom predictors 402 and 
452, respectively, are provided to multiplexer 482 for a 
given conditional branch instruction. Concurrently, 
predictor 402 generates 3 ^bit confidence level 
"Statei0.ut:G"' which, is forwarded to AND gate 480. The 
output of this 'gate is. applied' -t 6^^; t^ control input of 
rmiltiplexer 482 and- selects the prediction f roTn;^elther 
predictor 402 or 452 ' If all three^bits of "StatelOut .C" 
ate high (representing a Hecimal 7), a logic high is 
applied td tj^^ selector input of multiplexer 482vthereby 
selecting "StatelOut .P" ^as system-level prediction 
" Predict Out " Alternatively, if the output of AND .gate 
480 is a logic' low, multiplexer 482 selects prediction 
"State20ut .P" and conveys this as system- level prediction 
"PredictOut" . This predictidri^is forwarded to a fetching 
unit (not shown) of the host processor to control 
subsequent instruction streams. 

The foregoing operation of gate 480 and 
■ multiplexer 482 is based upon a PTV for system 400 
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, ( " system 400 PTV" ) of (decimal 7 . Accordingly , - when the 
; predict ion of predictor. ^ for the given conditional 
/branch instruction has a confidence level that satisfies 
■ system 4 00 PW (i ,,e . , a decimal value of 7) , then the 
prediction "StatelOiit .P" from, this predictor is used as 
the system- leiyel; prediction. 'However, when this 
prediction is not associated with; a sufficiently high 
confidence. level, the prediction from predictor 452, 
■ ;vhich in thisy embodiment is/ the final predictor, is - 
utilized as th^ system- level -prediction "PredictOut" 
; . '^Accordingly, gate 480 determines whether the confidence 
level generated in predictor 402 satisfies system 400 PTV 
and, if so, .outputs a logic high to indicate: 
' acceptability. ■ 

Upon execution. of the given conditional branch 
instruction by an execution unit in the host processor, 
actual output value 110 resulting from such execution 
t®^g-' s single bit indicating whether the branch is 
\, t^ken or not-taken) is provided by the execution unit to 
the inputs of branch prediction system 400 through a 
feedback loop.: , If the confidence level "Statelln.C" of 
stage 402 did not satisfy system 400 PTV (i.e., 
"Statelln.C" does not equal 7) when this prediction was 
made, lookup table 454 was selected for a prediction. If 
this prediction' was correct, no change is made to Table 
454. If, however, prediction; .bit; "State20ut.P" from 
Table 454 is incorrect (i.e. , this; P^re^dict ion does not 

equal actual outcome value 110 generated for the given 
branch instruction), then a complemented "State20ut . P" 
(equal to correct value 110) is written into lookup Table 
454 to replace the previously stored prediction for this 
entry. 

Further, when confidence level "Statelln.C" 
does not satisfy system 400 PTV when a prediction is 
made, and if the associated actual outcome value 110 
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equals^the associated prediction "StatelOut , P" of Table 
404, then confidence level "Statelln.C" is incremented 
"(but not beyond- 7) in accordance with the pseudocode ^and 
truth table of Tables 5 . and 6.,: respectively. The 
prediction associated with this confidence level remains 
unchanged. However, if actual outcome value 110 differs 
from prediction "StatelOut . P" for the given branch 
instruction, then confidence leveT "Statelln. C" is 
decremented by 4 (but not .-below -0.) again in accordance, 
with the pseudocode of Table 5. Additionally, the 
associated prediction "StatelOut . P" is replaced with 
value 110 associated with the given branch instruction. 

Finally, i^f the confidence level "Statelln.C" 
"for the given prediction-operation does satisfy system 
400 PTV (i.e;, "Statelln.C" equals 7), then any update 
operation required applies exclusively to lookup Table 
404. Significantly, no update is performed on Table 454 
and, therefore, this table is isaved from having space 
unnecessarily consumed by a branch instruction that 
relies on a different predictor stage for its prediction. 
(As described above, gate 468 of stage 452 prevents any 
update to Table 454 when confidence level "Statelln.C" 
equals 7 and thereby satisfies system 400 PTV) . However, 
global history register 460 will always be updated in 
accordance with the foregoing description in the event of 
a misprediction. '-'Z. 

Fig. 9 illustrates state output signals 
.generated by branch predictor system 400. The seven 
state output signals that make up the composite StateOut 
900 for System 400 are concatenated together and 
maintained by the host processor until needed for 
updating operations. In such case, StateOut 900 is 
returned to processor 400 as Statein 1000 of Fig. 10. 
Referring to Fig. 10, Statein 1000 contains the 
individual state signals recjuired by various components 
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.of predictor system 400 to carry out updating operations. 
The output and input signals shown in Figs'. 9 and 10,. 
resp).ectively, bear a one-to-one* correspondence. No 
change,, is made to their values ; Rather, they are simply 
maintained .by, the host processor while a branch ; 
instruction is executed to provide the necessary' 
information should updating be required. -These values 
may be miaihtained through a series of latches, cache 
memories or any pother temporary storage. 

• In simulated operation, predictor 402 of branch 
predictor system 400 catches all conditional branches 
that .nearly always go" the same direction every time. The 
confidence level. Crstatelln.C" ) indicates that last- 
V - direction- works" Weil and no other prediction is required. 
" On, a sample benchmark (i.e., the 126 .gcc program of 
SPECint95) , 63% of branches were predicted by predictor 
402 and only 37% passed on. to final predictor 452 . The 
mispredict rate on the branches predicted by first 
predictor 402 was only 2.1%. The 37% that passed on were 
of a more varied behavior and predictor 4 02 assigned low 
confidence levels to these branches. Such branches 
loaded onto predictor 452 which, as described above, 
incorporates global history in its prediction. Because 
the easily predicteid branches of first predictor 4 02 did 
not consume table space in final predictor 452, this 
final predictor is more effective since there is less 
risk of false shares/ 

. The miss rate on branch instructions serviced 
by final predictor 452 was 16.1% which resulted in a 
combined overall miss rate of 7.3%. 

Fig. 11 illustrates trace-driven simulation 
results from "21 different predictors and predictor 
systems against the branch stream of the benchmark 
program SPECInt95 126. gcc on one of its many inputs 
(i.e., amptjp) . Instruction and address traces were 
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generated using conventional methods and f ed into^ a 
branch prediction simulatpr 'program which decoded 
instructions/ predicted branches and verified the 
■predictions with the branch results to collect statistics 
for branch prediction .accuracy - 

Referring again to;Fig. 11/ the types of 
predictors -simulated are listed in- the first columh; and " 
include three.\bimodal predictors ("bimodal") , nine, local- 
history predictors {"local"), three GSHARE predictors 
( *^gshareV;) , three combined predictors- as defined in 
McFarling ( "pair" j , and tKree branch predictor systems 
utilizing -filtering in accordance with the present 
invention (" filter" ) .'Golumha 2 through 5 of Fig., 11 
report table size in bytes {"MEM"), number of mispredicts 
experienced by the predictor or predictor system ("M"), 
number of branches serviced by each predictor or 
predictor system ( "B") , and the ratio of mispredicts to 
branches ("M/B") . 

Fig. 12 illustrates the placement of branch 
predictor system 400 within a host processor 1200. This 
processor is pipelined with each stage being separated by 
latches 1250. As shown iri^ Fig. 12, branch predictor 
system 400 receives program counter values (PC Rvalues) 
from program counter register 1202 . System 4 00 processes 
every PC value received and generates a prediction value 
"(i.e., PredictOut) whether or not such value is actually 
necessary. Control signals geinerated by host processor 
1200, namely "FetchBr" and "ExeBr, " determine the use of 
a particular PredictOut value. > 

Referring to Fig.' 12, an instruction associated 
with a particular PC value will be retrieved, for 
example, from instruction cache ,1206 and decoded by 
decoder 1212 concurrently with branch-prediction 
processing of system 400. The decoder will determine 
instruction type and feed this information back to system 
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. .400 as signal "FetchBr". This signal, as described 

above; controls ^the shift-in operation of global history 
register 4*60 .;. Accordingly, a newly -computed PredictOut 
value is speculatively sjiifted into register 460 only if 
5 the corresponding- instructiori (via the PC value) ,is a 

conditional branch. ' 

. /As" shown in Fig/ 12,^ b^ 400 and 

; decoder 1212- are"^ coupled to a fetch^ ^control unit 12^ 

This unit receives a PredictOut value ■ from system '400 and 

iO an instruction-type signal from decoder:\i2 12 (i.e., a 

signal indicating^ ^at least, whetheor the. instruction 
associated with;the latiest PredictOxit: is a conditional 
branch) . If the associated instruction is .something 
other than a conditional branch, the -PredxctOut value/can 

15 . be safely ignored. However, if the associated 

instrriction ris a cbnditibna branch, unit 1210 will 
utilize the : corresponding. PredictOut value generated by 
system 400 to fetch subsequent instructions in accordance 
with the -prediction. ' 

20 Re f erring again to Fig. ,12, fetch control unit 

,1210 may select (via a multiplexer 1208) addresses 
provided by an execution unit 1214, decoder 1212 or an 
incrementor 1204. The address selected Via multiplexer 
1208 is for>?arded to > program counter register 1202 which, 

25 in turn, will, forward the new address to system 400 and 

instruction cache 1206 to begin the process again. 

Once the branch instruction is executed by 
eixecut ion unit 1214/ actual outcome value, state 
infojnnation and a branch execution signal "ExeBr" are 

30 re turnisd to system 4 00 to update predictors in accordance 

with the foregoing discussion of Fig. 4. More 
specif ically; if the instiruction associated with a 
particular PredictOut value is not a conditional branch 
ins tructidn, then signal "ExeBr" (generated by execution 

35 unit 1214)' will prohibit any updating of the predictors 
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of system 400, as described above. Alternatively, if 
such instruction is a conditional branch, then "ExeBr" 
shall be a logic high allowing selective updating as 
described above. Like the embodiment of Fig. 3, actual 
outcome value ^110 is also generated in execution unit 
1214 and forwarded to system 400. Further, state 
information is temporarily held within host processor 
12 00 through any conventional means (i.e., latches, cache 
memory,; etc.), until the actual outcome value is 
available. After which, this information is also 
forwarded to system 4 00 and updating may be performed. 

While the foregoing is a complete description 
of the embodiments of the invention, various 
modifications, alternatives, and equivalents may be useid. 
Accordingly, the above description should not be taken as 
limiting the spope of the invention which is defined by 
the appended claims. 
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WHAT IS CLAIMED IS : 



1 1. A- system that generates a prediction; for a 

2 given, situation comprising: . ' ; ' 

3 a plurality of 'predictors generating a 

4 plurality of prediction values for the given situatioh; 

5 ''f- #^ . ^ means for processing said plurality of : 
■6^ prediction values to prodiice' said-prediction; and 

a feedback loop coupled to :said^plura^ity ^oi^" 

8 ' preidictprs for/updatirig only a portion of said predictors 
based tipibn an acitual outcome of .the given: situation. 

/I • 2. The system of claim.; 1 wherein said prediction 

2 ■ indicates whether a conditional branch instruction is 

3 taken or not -taken. 

■ " ' if , 

1 3. The system of claim 1 wherein said plurality of 

'2 predictors are each assigned a unique priority level and 

3 said means for processing selects bne of said plurality 

4 pf prediction values as said prediction, said one. of said 
5' plurality of prediction values being geneirated from one 

6 of said plurality of predictors assigned a first priority 

7 level; and 

8 further wherein said portion of said predictors 

9 ' ; only includes predictors whose priority level mebts or 
10 exceeds said first priority level. 

1 4. The system of claim 1 wherein said plurality of 

2 predictors are each assigned a unique priority level and 

3 at least one of said plurality of predictors is operable 

4 to indicate acceptability of, its prediction value, and 

5 further wherein said means for processing selects one of 

6 said plurality of prediction yalues as said prediction, 

7 said one of said plurality of prediction values being 

8 generated by: 
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9* ' ' ■ " a first predictor, of. said, plur^^ of ~ 

10 predictors' when said, first%redictoir i^^ 

11 . Acceptability of its prediction value and has a highest 

12 - assigned: priority level ampngv.any other predictor of said 
,13 " - .piuraiy:ty of predictors that^allo indicates acceptability 

14 - of its respect ive-^{predict:ik^ 

15 a secoridlpredictor of said plurality of 

16 predictors wHen none of said plurality of predictors 

17 indicates acceptability of its prediction value, said 

18 second predictor hkving a Ib^^ level. 

1 ■ .'5/ The system of claim 4 wheirein said prediction 

2 indicates whether a conditional branch, instruction isi 

3 : ■ taken or not - taken . 

1 6, The system of claim 4 wherein said first 

2 predictor indicates acceptability of its prediction value 

3 when said first predictor generates a confidence level 

4 that satisfies a predetermined threshold value. 

1 7. The system of claim 6 wherein said prediction 

2 . indicates; whether a conditional branch instruction is 
3- takesn or not -taken. 

1 8. The system of claim 6 wherein said confidence 

2 leyeii is generated by an asymmetric saturating coxmter. 

1/ '9,^- The system of claim 8 wherein said prediction 

2 -^ indicates whether a conditional branch instruction is 

3 ' taken or not -taken and wherein said first predictor is a 

4 last -direction predictor. 



1 
2 



1.0. . The system of claim 9 wherein said second 
predictor is a GSHARE predictor. 
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■\y J::;- , ^1' - A ^piredictor system that generates a desired 
prediction f:dr 'la given^ instruction comprising : 

a/pliara predictors generating a 

;: plurality of preidict ions, each preidictor being assiigned a 
priority I'evel.v and at>least one pr-edic tor being operable 
to ;indic3.t acceptability; of its prediction; ^ 

a selection circuit coupled to said plurality 
/ : . of predictors, said circuit selecting the desired 

^ prediction from, a desired preldictor, wherein said desired 
.predictoryyis : ..r%r^^^^ " :;■ 

a f irsit/ predictoi: of said plurality of 
predictors when .said first predictor > indicates 
acceptability of ; its prediction and has a highest 
assigned priority level amdhg any other predictor of said 
plurality of predictors 4t hat also indicates acceptability 
of its respective prediction; and 

a second predictor of said plurality of 
predictors when none of said plurality of predictors 
indicates acceptability of its prediction, said second 
predictor having a lowest assigned priority level. 

12. The^^ claim 11 further 

comprising a feedback loop coupiled to said^^p of 
" predictors for;, updating orili^ a: poirtion of said plurality 
of predictors with actual outcome information provided 
from; execution of said given instruction. 

../l3 . The branch predictor system of claim 11 wherein 
' said selection circuit comprises : ^ i 

a priority encoder coupled to said plurality of 
predictors ; and 

a multiplexer coupled to said priority encoder 
and said plurality of predictors. 
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■14. The branch predictor system of claim 13 wherein 
said desired prediction indicates whether a conditional 
branch instruction is "taken or not taken. 

15. The branch, predictor system: of claim 14 wherein 
said first predictor indicates"^ acceptability of its 
prediction after said first "predictor generates a 
confidence level that satisfies a predetermined threshold 
value . . ^ . 

16. The branch predictor system of claim 15 wherein 
said confidence level -is generated by an asymmetric 

^saturating counter. • 

,17. A method tor generating a prediction for a 
given instruction comprising the steps of: 

providing a plurality of predictors for 
receiving address information of the instruction; 

producing a predictdon value by at least one 
predictor of said plurality of predictors; 

processing said prediction value to generate 
said prediction; and 

updating only a portibmof said predictors with 
actual 'outcome infoinmation provided from execution of 
said given instruct ion . 

, 18: The method, of claim 17- further comprising the 
step of assigning a priority level to each of said 
plurality of predictors wherein said at least one 
predictor has a first priority level and said portion of 
said predictors only includes predictors whose priority 
level meets or exceeds said first priority level. 

19. The method of claim 17 further coirqprising the 
steps of : 
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3 assigning a' priority level , to each, of said 

4 plurality, of predictors; 'and. 

5 indicating^ acceptability of .said prediction 

6 . , value by said at least one predictor wherein said at 

7 least one predictor has a highest assigned priority level 

8 among any. other predictor of said plurality of predictors 

9 that also indicates acceptability of its respective 
10 value . 7^.^ M 

1 ^ / ,20: ^ The method of claim 19 wherein said step of 

2 indicating acceptability further comprises the steps of : 

3 generating a confidence level by said at least 

4 one predictor; .and' 

5 'determining whether said 'confidence level~ 

6 satisfies a predetermined threshold value. 



wo 99/14667 



PCT/tlS98/I9674 




Z . 
O 

O 
Q 

UJ 

o 



3 

O 

CO 
O 



ll. 

Oz 

ZO 
UJi= 

PtO 




3 

CO 

CO 
O 



Q. 



^2 




SUBSTITUTE SHEET (RULE 26) 



W0 99/1466T 



PCT/US98/I9674 



2/10 



CO 

o- 



in 
o- 









ICTOR 


■ c 








rnb 














PREDICTOR n - 


1 

c 
2 






c\i 
2 






1 CM 






u 

Z 1 







c 

c: 
O 



I 

c 



O 




CM 
CD 



1! 

LU : 

> + 
LU It 



us 



UJ 



"-2 

cn< 

UJcC 

rr fr 
CQUJ 



SUBSTITUTE SHEET (RULE 26) 



BMaoodO: <«K> I- Mri4aamiJL> 



wo 99/14667 



PCT/US98/19674 



3/10 



BranchPC 104 



302 

A: 



PREDICTOR 1 Cli 



304 





S2 


PREDICTOR 2 CI2 




P2 



306 







Sn-1 


PREDICTOR 


Cln-1 
Pn-1 


n- 


1 



308 







Sn 


, PREDICTOR n 






Pn 



□ 



350 



310 







D2 


PRIORITY 
ENCODE 


Dn- 


1 Q1Q0 



309 



300 



315 



312 



Statein316' 



352 



354 



S = STATE FOR UPDATE 

CI = CONFIDENCE INDICATOR 

P = PREDICTION 



317 



i 



103 



PROGRAM 
COUNTER 



SlateOut316 
PredictOut 313, 



110 ACTUAL 
OUTCOME 
VALVE 



319 

A. 



EXECUTION 
UNIT 



HOST 
PROCESSOR 



FIG. 3A 



SUBSTITUTE SHEET (RULE 26) 



wo 99/14667 



PCT/US98/I9674 



4/10 



PROVIDING A PLURALITY 
OF PREDICTORS 



ASSIGNING PRIORITY LEVELS 



PROVIDING A CONDITIONAL 
: BRANCH INST ADDRESS 



■371 



372 



373 



370 



1 



GENERATING BRANCH 
PREDICTORS AND CONFIDENCE 
LEVELS 



374 



_ -376 

ANY ^ 
CONFIDENCE LEVElT — YES 
SATISFYING 
PTV7 



378- 



SELECT PREDITOR WITH 
HIGHEST PRIORITY LEVEL 
SATISFYING PTV 



SELECT FINAL PREDICTOR 



-380 



SELECTIVELY UPDATE 
PREDICTORS 



FIG. 3B 



SUBSTITUTE SHEET (RULE 26) 



wo 99/14667 



PCT/US98/19674 




SUBSTITUTE SHEET (RULE 26) 



WO;99/l4667 



PCT/US98/19674 



6/10 



PrediclOut > 



FetehBr > 



Slate2ln:GHIST> 



ExeBr > 

110 ACTUAL > 

OUTCOME 
; i VALUE 

Predictin > 




State20ut.GHISt 



FIG. 5 



- '■ 458 . 

BranchPCZ 
\ PCri^:3l 



PC [63:2] > 



State20ut,GHIST v— r 



I 

I 
I 
I 




State20ut.A 



FIG. 6 



SUBSTITUTE SHEET (RULE 26) 



wo 99/14667 - PCT/US98/19674 



7/10 



.Statel In.C > 
MisPrI > 

110 ACTUAL > 
OUTCOME 
VALUE 




FIG. 7 



,456 



State2ln.P 



FIG. 8 



SUBSTITUTE SHEET (RULE 26) 



wo 99/14667 



PCT/US98/I9674 

8/10 ' 



StatelOut.A 




FIG. 9 




SUBSTITUTE SHEET (RULE 26) 

^^i«aO(MID:<«»__flW4e87A1JL> ^ 



wo 99/14667 ' PCT/US98M9674 



9/10 



TOTAL INSTRUCTIONS: 
TOTAL BRANCHES: ; ■ 
BCOUNT INSTRUCTIONS: 
-CqUNTED BRANCHES: 

PREDICTOR TYPE 



1473778275 
207515782 



bimodal 
bimodal 
brmodal 
local 
local 

lOCEll 

local 
local 
Ideal 
local 
Iddal 
local 



■12 
•13' 
^14 



gsh^re 
gshare 
gshare 



^9x11_'11] 
lOxirjII] 
M1x12_'12] 

'ioxsjai 

'Hx8_'8] 
•12x8_'8] 
'•11x4J4 
'12x4_'4 
'13x42;4 
8{«4)_'12 
9(<<4 



'-'13; 



10(«4)_'14] 
pairf'l OKbimodalCI 0),gshare( 1 1 _'1 1 )] 
bimodal['10] 
gshare[H_'11] 
pair['1 1,](bimodal('1 1 ).gsHare(12_'12)) 
bimOdal['11] 
gshare[12_'121 
pairf'12)(b rhodal('1 2).gshare(i 3J1 3)] 
bimodal['121 . ; , 
gshare[T3_'13] i <■ : 

filterilasl('10):3-4.gshare(12_'12x1)] 
iastt'lO] s 
gshareI12_'12x1] 
filter(lastC1 1 ):3-4.gshare(1 3_'1 3x1 )] 
lastt'il] ■: 

'#hare[13_'13x1] . 
fiiter[la§t('12):3-4.gshare(1 4_'1 4x1 )] 
last('12] 

gshare[14_'14x1] 



0 




0 

■ ■ ■'" 




MEM 




1024 


26542357 


2048 


25429690 


4096 


24740522 


1216 


33353900 


1920 


27249892 


4096 


22587992 


1088 


30706368 ^ 

\J i WWwW 


2112 


26618998 


4160 


23457556 


1028 


31127832 


2052 


27950198 


4100 


26534407 


1025 


25201755 


2050 


20877237 


4098 


17688778 . 


1026 


22780786 


256 


32693264 


514 


, 36483651 


2050 


18316187 


512 


29083070 


1026 


29094774 


4098 


15127601 


1024 


26542357 


2050 


23695094 


1026 


21551376 


512 


40793795, 


514 


17571673 


2050 


17138826 


1024 


36685637 


1026 


13514386 


4098 


14553073 


2048 


34283339 


2050 


11202582 



207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
207515782. 
207515782 
2075157$2 
207515782 
207515782 
207515782 
-207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
207515782 
.94533972 
1207515782 
207515782 
^ 84998341 
207515782 
207515782 
78688336 



M/B 

0.1279 
0;4225 
Oil 192 
0.1607 
0.1313 
0^088 
0.1480 
0.1283 
0.1130 
0.1500 
0.1347 
0.1279 
0.1214 
0.1006 
0.6852 
0.1098 
Oil 575 
0.1758 
0.0883 
0.1401 
0.1402 
0.0729 
0.1279 
0.1142 
0.1039 
.0.1966 
0.1859 
0.0826 
0.1768 
0.1590 
0.0701 
0.1652 
0.1424 



FIG. 11 



SUBSTITUTE SHEET (RULE 26) 



wo 99/14667 



PCT/US98/19674 



o 

CM- 




cvj 
CD 

G: 



SUBSTITUTE SHEET (RULE 26) 



bteniaiiooal appiicaiioa No. 
PCT/US98/19674 



CLASSIFICATION OF SUBJECT MATTER 
IPC<6) :G06F?/38- ^ ^ 

US CL : 395/586, 587 . ^ v 

According to Inteniatiooal Pateat Classificaoda (IPC) or to both national dassificatiba and IPC 



a FIELDS SEARCHED 



Minimum documeotatioo searched (ctassification system followied by classification symbob) 
U.S. : 395/586. 587 -■■■■'[^ 



Documenutton searched other than minimum documenutioo to the extent that such documqits are included in the fields searched 



Electronic database consulted during the^ international search, (name of data base and, where -practicable, search terms used) 
IEEE DATABASE (I989.PRESENTX APS . ' \ 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



CttatioB of document, with indication, where appropritttlB, of the relevant passages 



Relevant to claim No. 



EVERS Et AL. USING HYBRID BRANCH PREDICTORS TO 
IMPROVE BRANCH PREDICTION ACCURACY IN THE 
PRESENCE OF CONTEXT SWITCHES, 23RD ANNUAL 
INTERNATIONAL SYMPOSIUM ON COMPUTER 
ARCHITECTURE, ACM, MAY 1996, PAGES 3-1 1, ESPECIALLY 
PAGES 5-10. 

CHANG ET AL., ALTERNATIVE IMPLEMENTATIONS OF 
HYBRID BRANCH PREDICTORS, PROCEEDINGS OF THE 
28TH ANNUAL INTERNATIONAL SYMPOSIUM ON 
MICROARCHITECTURE, 1995, IEEE, PAGES 252-257* 
ESPECIALLY PAGES 253 AND 255-256. 



1-20 



1-20 



I I Further docameotiiarB listed tn the oootiDvatioo of Box C. = | | See patent family ansin.. 



ktm 6owf <ni fMthaA •Ibr.dM bi i m — tMwiri ftliDs' datt or phoriiy 



docMCM da&iB( *• |«Biml aMM ^ art wUoh ii Mt 
to b« of pwiaeakmlaraaM 



ctMd lo oMiblab 



Mnd cm or ttim lb« i 



ira^dMo 



doote on priorttjf akmit) or mkiak i 



B«al of ptwmlar whwnr, (bt olaimod uwauikiu cannot b« 



nfaos dale biA li 



boai ebviooB to a p«Ma ikiDad io Mm at 
mam Maabar of iha tamm pt/tme/L tally 



Date of the actual completipo of the incematiooal seaicb 
19 NOVEMBER 1998 



Date of mailing of the intematiooal search report 



14 JAN1999 



Name and mailing addfcss of the ISAACS 
Commissiooer of Patents «i 

Box per 

. Waafaingtoo. D.C 20231 < 
Facsimile No. a03) 305-3230 



M. TREAT 



Autbomed ofiCtcer 

WILUAM 
Tdephooe Ho. a03) 30S-9699 



Form PCTASA^IO (teeood thoetXlnly 1992)* 




m/S PAGE BUNK (uspro, 



